An efficient strategy using k-mers to analyse 16S rRNA sequences
نویسندگان
چکیده
The use of k-mers has been a successful strategy for improving metagenomics studies, including taxonomic classifications, or de novo assemblies, and can be used to obtain sequences of interest from the available databases. The aim of this manuscript was to propose a simple but efficient strategy to generate k-mers and to use them to obtain and analyse in silico 16S rRNA sequence fragments. A total of 513,309 bacterial sequences contained in the SILVA database were considered for the study, and homemade PHP scripts were used to search for specific nucleotide chains, recover fragments of bacterial sequences, make calculations and organize information. Consensus sequences matching conserved regions were constructed by aligning most of the primers used in the literature. Sequences of k nucleotides (9- to 15-mers) were extracted from the generated primer contigs. Frequency analysis revealed that k-mer size was inversely proportional to the occurrence of k-mers in the different conserved regions, suggesting a stringency relationship; high numbers of duplicate reactions were observed with short k-mers, and a lower proportion of sequences were obtained with large ones, with the best results obtained using 12-mers. Using 12-mers with the proposed method to obtain and study sequences was found to be a reliable approach for the analysis of 16S rRNA sequences and this strategy may probably be extended to other biomarkers. Furthermore, additional applications such as evaluating the degree of conservation and designing primers and other calculations are proposed as examples.
منابع مشابه
Genetic variations of avian Pasteurella multocida as demonstrated by 16S-23S rRNA gene sequences comparison
Pasteurella multocida is known as an important heterogenic bacterial agent causes some severe diseases such as fowl cholera in poultry and haemorrhagic septicaemia in cattle and buffalo. A polymerase chain reaction (PCR) assay was developed using primers derived from conserved part of 16S-23S rRNA gene. The PCR amplified a fragment size of 0.7 kb using DNA from nine avian P. multocida isolates...
متن کاملMolecular Detection of Novel Genetic Variants Associated to Anaplasma ovis among Dromedary Camels in Iran
To the best of our knowledge, little information is available regarding the presence of Anaplasma species in camels in Iran. This study sought to investigate the presence of Anaplasma species by microscopy and polymerase chain reaction (PCR) assays in 100 healthy dromedaries (Camelus dromedarius) arriving for slaughter. The microscopic examination of Giemsa-stained blood films revealed that Ana...
متن کاملPhylogeny of urate oxidase producing bacteria: on the basis of gene sequences of 16S rRNA and uricase protein
Uricase or Urate oxidase (urate:oxygen oxidoreductase, EC 1.7.3.3), a peroxisomal enzyme which is found in many bacteria, catalyzes the oxidative opening of the purine ring of urate to yield allantoin, carbon dioxide, and hydrogen peroxide. In this study, the phylogeny of urate oxidase (uricase) producing bacteria was studied based on gene sequences of 16S rRNA and uricase protein. Repres...
متن کاملAn efficient strategy for screening large cloned libraries of amplified 16S rDNA sequences from complex environmental communities.
We propose a strategy for the efficient screening of large libraries of amplified 16S rRNA genes from complex environmental samples. It consists of processing sets of multiple clones simultaneously. This strategy saves up to 90% of the costs and labor spent in the process of screening a 16S rDNA library.
متن کاملEvolutionary Tree Based on Oligonucleotide Frequencies and Conservative Words in 16S and 18S Ribosomal RNA
Sequence distances are defined in terms of the differences in the occurrence frequencies in sequences of oligonucleotides of length n. Such n-distances are used to construct phylogenetic trees from a set of thirty-five 16S or 18S rRNA sequences. The quality of the trees generally improves with increasing n and reaches a plateau at n=7 or 8. The best n-distance trees are compatible to trees base...
متن کامل